AI.-TR-1991-0128 


AD-A262  672 

I  until  nil  mil  mil  mil  mil  Hill  III  ll!l 


MEASURES  OF  SITUATION  AWARENESS: 
REVIEW  AND  FUTURE  DIRECTIONS  (U) 


Martin  L.  Fracker,  Major,  US AF 

CREW  SYSTEMS  DIRECTORATE 
HUMAN  ENGINEERING  DIVISION 


^  '1 


dt\c 

ELJECTE 
APR  09 199^1  I 

I  r  W 


OCTOBER  1991 


FINAL  REPORT  FOR  PERIOD  JANUARY  1990  TO  JANUARY  1991 


— 


Approved  for  public  release;  distribution  is  unliriiiteJ. 


98  4  08  056 


AIR  FORCE  SYSTEMS  COMMAND 
WRIQHT-PATTERSON  AIR  FORCE  BASE,  OHIO  45433-6673 


NOTICES 

When  US  Qovemment  drawings,  specifications,  or  other  data  are  used  for  any  purpose  other  than 
a  definitely  related  Government  prc  curement  operation,  the  Government  thereby  incurs  no 
responsibility  nor  any  obligation  whatsoever,  and  the  fact  that  the  Government  may  have 
formulated,  furnished,  or  in  any  way  supplied  the  said  drawings,  specifications,  or  other  data,  is 
not  to  be  regarded  by  implication  or  otherwise,  as  in  any  manner  licensing  the  holder  or  any  other 
person  or  corporation,  or  conveying  any  rights  or  permission  to  manufacture,  use,  or  sell  any 
patented  invention  that  may  in  any  way  be  related  thereto. 

Please  do  not  request  copies  of  this  report  from  the  Armstrong  Aerospace  Medical  Research 
Laboratosy.  Additional  copies  may  be  purchased  from; 

National  Ibchnical  Information  Service 
5285  Royal  Road 
Springfield,  Virginia  22161 

Federal  Government  agencies  and  their  contractor  registered  with  the  Defense  Technical 
Information  Center  should  direct  requests  for  copies  of  this  report  to: 

Defense  Ibchnical  Information  Center 
Cameron  Station 
Alexandria,  Virginia  22314 

TECHNICAL  REVIEW  AND  APPROVAL 

AL-TR-1991-0127 

This  report  has  been  reviewed  by  the  Office  of  Public  Affairs  (PA)  and  is  releasable  to  the  National 
Ibchnical  Information  Service  (NTIS).  At  NTIS,  it  will  be  available  to  the  general  public, 
including  foreign  nations. 

The  voluntary  informed  consent  of  the  subjects  used  in  this  research  was  obtained  as  required  by 
Air  Force  Regulation  169-3. 

This  technical  report  has  been  reviewed  and  is  approved  for  publication. 

FOR  THE  COMMANDER 

KENNETH  R.  BOFF,  Chief 
Human  Engineering  Division 
Armstrong  Laboratory 


REPORT  DOCUMENTATION  PAGE 


form  Approved 
0MB  No  Om  OIBB 


‘’■jijiK  ijiM  j''n  'F'Mhis  -  ii»*j  !i  -f^  )i  .'if»irr*5  «ti  ^  •%!  m.ni’d  • '  ui'r  |.j»*  '  -..uf  t,i>r  '<'sporsi‘  m'f.idi'Hj  tii#'  iioip  Inf  fpvi»»*vintj  to\tri<rti«)ni.  **iminq  d/i(«  sourcfi, 

•j.Uhi'nnq  md  ih*#  Mi'nfipd,  i  t  irpieiin  |  »'  ti  f*'vn-  .vm.j  ir^i*  -  iii»  h.-n  of  iiifi»»m.nion  S^iwJ » Ofnmflnu  fnnrfrdinq  thu  burden  or  «»f^y  other  d^ptct  of  thu 

t.oliection  .»t  rrtf«tfo5.nii.in,  iru  ludiriq  yuq  jesliij''s  'of  rediji  me;  u-i',  jvjrd>'n  t  .•  ll•*.ttialIJrte'%  s»"v«iei,  fHfpctorrfte  Tor  mfo'watior^  ODer.iliOMv  and  Reports,  >215  jefferwn 

Ortvis  Suilr  Wi)‘1  uf  itn.jif)i5.  v  A  -t  ID/  ri’d  t*  c*ii»  •  iMi- 1*  -  M  |l•-|.ioml>nt  4hi1  Hiidijet  •*  iperwork  Redurlicr'  Pro}i>i  t  t0/0<*0t88).  WriShtr^Uton.  OC  20503. 

1  AGENCY  USE  ONLY  a t'ive  d/anAO  2.  REPORT' DATE  3  REPORT  TYPE  AND  DATES  COVERED 

October  1991  Final  Report  January  1990-Januarv  1991 

4.  TITLE  AND  SUBTITLE 

Measures  of  Situation  Awareness:  Review  and 

Future  Directions 

S.  FUNDING  NUMBERS 

r  rnriri  m  Gi.nvir 

PE-  62202F 

PR-  7184 

TA-  14 

WU-  25 

6.  AUTHOR(S) 

Martin  L.  Fracker,  Major,  USAF 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  AnDRESS(ES) 

Human  Engineering  Division 

Armstrong  Laboratory 

AL/CFHW 

Wright-Patterson  AFB  OH  45433-6573 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

AL-TR-1991-0128 

9.  SPONSORING /MONITORING  AGENCY  NAME(S)  mNO  AODRESS(ES) 

Human  Engineering  Division 

AL/CFHW 

Wright-Patterson  AFB  OH  45433-6573 

10.  SPONSORING /MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMEi^TARY  NOTES 

12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  is 
unlimited. 

12l>.  DISTRIBUTION  CODE 

A 

13.  ABSTRACT  (Maximum  iOO  words) 

Meagurss  of  situation  awaronass  (SA),  or  what  oparators  know  about  thair 
immediate  situation,  are  reviewed.  Three  major  approaohes  to  SA  assasament 
are  considered:  explicit,  implicit,  and  subjective  rating.  Explicit  measures 
require  operators  to  self-report  material  in  conscious  memory.  Implicit 
measures  assess  the  influence  of  relevant  events  on  subsequent  task  performance. 
Subjective  ratings  require  operators  to  assign  numerical  values  to  the  salf- 
asBessed  quality  of  their  SA.  These  three  measurement  approaches  are  evaluated 
in  terms  of  their  reliability  and  three  kinds  of  validity:  oonstruot,  content, 
and  criterion.  Several  problems  requiring  further  research  are  identified  and 
discussed.  In  particular,  reliability  and  content  validity  continue  to  present 
serious  difficulties,  some  of  which  suggeet  that  new  approaches  to  SA  measurement 
may  still  be  needed. 

14.  SUBJECT  TERMS 

Attention  reliability  situation  awarenes: 
memory  probe  signal  detection  theory  subjective  measurei 
mental  workload  validity 

IS.  NUMBER  OP  PAGES 

34 

IB.  PRICE  CODE 

17.  SECURITY  CLASSIFICATION 

OF  REPORT 

Unclassified 

18  SECURITY  CLASSIFICATION 

OF  THIS  PAGE 

Unclassified 

19.  SECURITY  CLASSIFICATION 

OF  ABSTRACT 

Unclassified 

30.  LIMITATION  OF  ABSTRACT 

Unli..iited 

NSN  7540  01  780-5500  Standard  Form  298  (Rev  2-89) 


by  ANSI  S^d 


ENERAL  INSTRUCTIONS  FOR  COMPLETING  SF  298 


The  Report  Documentation  Page  (RDP)  is  used  in  announcing  and  cataloging  reports.  It  is  important 
that  this  information  be  consistent  with  the  rest  of  the  report,  particularly  the  cover  and  title  page. 
Instructions  for  filling  in  each  block  of  the  form  follow  It  is  important  to  stay  within  the  lines  to  meet 
optical  scanning  requirements. 


Block  1.  Aoencv  Use  On\v  (Leave  blank 


Block  2.  Report  Date.  Full  Publication  date 
including  day,  month,  and  year,  if  available  (e.g,  1 
Jan  88).  Must  cite  at  least  the  year. 

Blocks.  Type  of  Report  and  Dates  Covered 
State  whether  report  is  interim,  final,  etc.  If 
applicable,  enter  inclusive  report  dates  (e.g.  10 
Jun87-30Jun88). 

Block  4.  Title  and  Subtitle  A  title  is  taken  from 
the  part  of  the  report  that  provides  the  most 
meaningful  and  complete  information  When  a 
report  is  prepared  in  more  than  one  volume, 
repeat  the  primary  title,  add  volume  number,  and 
Include  subtitle  for  the  specific  volume  On 
classified  documents  enter  the  title  classification 
in  parentheses. 

Blocks,  Funding  Numbers.  To  include  contract 
and  grant  numbers;  may  include  program 
element  number(s),  project  numberfs),  task 
number(s),  and  work  unit  number(s).  Use  the 
following  labels: 


C  ■  Contract 
G  -  Grant 
PE  •  Program 
Element 


Project 

Task 

Work  Unit 
Accession  No, 


Block  6.  Author(s).  Name(s)  of  person(s) 
responsible  for  writing  the  report,  performing 
the  research,  or  credited  with  the  content  of  the 
report.  If  editor  or  compiler,  this  should  follow 
the  name(s). 

Block  7.  Performino  Organization  i 


Address(es).  Self-explanatory. 

Block  8.  Performing  Organization  Report 
Number.  Enter  the  unique  alphanumeric  report 
number(s)  assigned  by  the  organization 
performing  the  report. 

Block  9.  Sponsorinq/Monitorinq  Agency  Name(s) 
and  Address(es)  Self-explanatory 

Block  10.  Sponsor! ng/IVIonitorinq  Agency 
Report  Number.  (If  known) 

Block  11.  Supplementary  Notes  Enter 
information  not  included  elsewhere  such  as: 
Prepared  in  cooperation  with..  ;  Trans,  of...;  To  be 
published  in..  When  a  report  is  revised,  include 
a  statement  whether  the  new  report  supersedes 
or  supplements  the  older  report. 


Block  12a.  Distribution/Availability  Statement. 


Denotes  public  availability  or  limitations.  Cite  any 
availability  to  the  public.  Enter  additional 
limitations  or  special  markings  in  all  capitals  (e  g. 
NOFORN,  REL,  ITAR). 


See  DoDD  5230.24,  "Distribution 
Statements  on  Technical 
Documents." 

See  authorities. 

See  Handbook  NHB  2200.2. 

Leave  blank. 


DOE 

NASA 

NTIS 


Block  12b.  Distribution  Code. 


NASA 

NTIS 


Leave  blank. 

Enter  DOE  distribution  categories 
from  the  Standard  Distribution  for 
Unclassified  Scientific  and  Technical 
Reports. 

Leave  blank. 

Leave  blank. 


Block  13.  Abstract.  Include  a  brief  (Maximum 
200  words)  factual  summary  of  the  most 
significant  Information  contained  in  the  report. 

Block  14.  Subject  Terms.  Keywords  or  phrases 
identifying  major  subjects  in  the  report. 

Block  15.  Number  of  Pages.  Enter  the  total 
number  of  pages. 


Block  16.  Price  0 
code  (NTIS  only) 


Enter  appropriate  price 


Blocks  17.  •  19.  Security  Classifications.  Self- 
explanatory  Enter  U.5.  Security  Classification  in 
accordance  with  U  S  Security  Regulations  (i.e., 
UNCLASSIFIED).  If  form  contains  classified 
information,  stamp  classification  on  the  top  and 
bottom  of  the  page 

Block  20.  Limitation  of  Abstract  This  block  must 
be  completed  to  assign  a  limitation  to  the 
abstract.  Enter  either  UL (unlimited)  or  SAR  (same 
as  report).  An  entry  in  this  block  is  necessary  if 
the  abstract  is  to  be  limited.  If  blank,  the  abstract 
is  assumed  to  be  unlimited. 


Standard  Form  298  Back  (Rev  2-B9) 


SUMMARY 


MsABuras  of  situation  awarsnass  (SA),  or  what  oparators  know 
about  thair  immadiata  situation,  ara  raviawad.  Thraa  major  approaehas 
to  SA  assaasmant  ara  oonaidaradt  axplioit,  implicit,  and  subjactiva 
rating.  Explicit  maasuras  raquira  oparators  to  salf-raport  matarial 
in  conscious  mamory.  Implicit  maasuras  assass  tha  influanca  of 
ralavant  avants  on  subsaquant  task  parformanca.  Subjactiva  ratings 
raquira  oparators  to  assign  numarical  valuas  to  tha  salf-assaosad 
quality  of  thair  SA.  Thasa  thraa  maaauramant  approaehas  ara  avaluatad 
in  tarms  of  thair  raliability  and  thraa  kinds  of  validity!  construct, 
oontant,  and  critarion.  Savaral  problams  raquiring  furthar  rasaaroh 
ars  idantifiad  and  disuussad.  In  particular,  raliability  and  content 
validity  continue  to  praaant  sarious  difficulties^  soma  of  which 
suggest  that  new  approaches  to  SA  maasuramant  may  still  bo  naedad. 


Acceslon  For 

NTIS  CRA&I 
DTIC  TAB 
Unannouncod 
Justification ... 


By . . . 

Distribution/ 


¥ 


Availability  Codes 


Dist 

Avail  and/or 

Special 

1 

/»-/ 

iii 


PREFACE 


The  author  thanks  Michael  Vidulioh,  Gary  Raidi  Maris  Vikmanie# 
and  Mica  Endsley  for  the  many  helpful  discussions  which  aided  in  the 
development  of  the  ideas  contained  in  this  technical  report.  Mark 
Crabtree's  proofreading  aasistanoe  is  gratefully  acknowledged.  Any 
errors,  whether  substantive  or  technical,  are  attributable  solely  to 
the  author. 


Iv 


TABLE  OP  CONTENTS 


Ptga 

INTRODUCTION . 1 

SA  MEASUREMENT  CRITERIA .  1 

Rftliability . 2 

Methods  for  Evaluating  Reliability . 2 

Improving  Reliability . . .  3 

Validity .  3 

Construct  Validity . 3 

Content  Validity .  5 

criterion  Validity. . . .  6 

REVIEW  OF  SA  MEASUREMENT  METHODS . 7 

Explicit  Meaeuras . . 7 

Retroapective  Measures .  7 

Concurrent  Measures. . . 9 

Implicit  and  Surrogate  Measures .  13 

Signal  Detection  Theoretic  (SDT)  Measures .  13 

Surrogate  Measures . 16 

Subjective  Rating  Measures . 16 

Direct  Ratings . 17 

Comparative  Ratings . 20 

DIRECTIONS  FOR  FUTURE  RESEARCH .  22 

Continuing  Research  on  Existing  Measures . 22 

Developing  New  SA  Measurement  Methods .  23 

REFERENCES .  25 


v 


ZMTRODUCTIOM 


Situation  awaronaaa  (SA)  rafara  to  military  oparatora'  knowladga 
of  tha  immadiata  taotioal  aituation  (of.,  Sartar  and  Wooda,  1991).  SA 
may  ba  among  tha  moat  important  aubjaota  to  ba  addraaaad  by  military 
payohologiata  in  raoant  yaara.  Clauaawita  (1832/1984)  aaama  to  hava 
baan  rafarring  to  SA— what  othara  hava  oallad  "tha  fog  of  war"— whan 
ha  wrota  that  tha  "difficulty  of  accurate  racognitio/i  oonatitutaa  ona 
of  tha  moat  aarioua  aourcaa  of  friction  in  war,  by  making  thinga 
appear  entirely  different  from  what  one  had  expaotad"  (p.  117).  Not 
knowing  the  true  tactical  aituation,  according  to  Clauaawita,  ia  ona 
of  the  principal  raaaona  why  evan  tha  aimplaat  thing  in  war  ia 
difficult,  avan  though  everything  in  war  ia  very  aimple  (p.  119).  Tha 
centrality  of  SA  in  warfighting  ia  further  evident  in  the  importance 
of  aurpriaa  in  war.  Surpriaa  ia  poaaibla  only  whan  tha  enemy 'a  SA  ia 
poor,  that  ia,  whan  tha  enemy  ia  unaware  of  tha  true  tactical 
aituation.  Aa  Clauaawita  obaarvad,  preventing  tha  enemy  from 
achieving  accurate  SA  ia  tha  maana  by  which  ona  aide  or  the  other 
gaina  aupariority  and  ia  ao  able  to  prevail  (p.  198).  Tha  logical 
corollary  ia  that  maintaining  good  SA  ia  a  nacaaaary  condition  for 
victory  in  war. 

Given  the  importance  of  SA,  it  ia  hardly  aurpriaing  that  tha  Air 
Force  haa  invaatad  conaidarabla  effort  in  trying  to  anhanca  combat 
pilot's  SA,  cither  through  pilot  training  (Kubanka  and  Killaan,  1983; 
Thomaa,  Houoke,  and  Boll,  1990)  or  through  improved  hardware  ayatoma 
(a.g.,  Arbak,  Schwartz,  and  Kuporman,  1987;  Hughaa,  Haaaoun,  Ward,  and 
Rueb,  1990;  Vanturino  and  Kunze,  1989;  Walla,  Vonturino,  and  Oagood, 
1988).  Evaluating  the  aucooaa  of  attempta  to  improve  SA  haa  boon  a 
crucial  but  difficult  atap.  Aaaaaaing  tha  quality  of  pilota'  SA  haa 
turned  out  to  be  a  much  larger  moaauromont  problem  than  it  firat 
appeared.  Thia  article  firat  eatabliahaa  criteria  againat  which  SA 
metrica  may  ba  ovaluatad  and  than  critically  roviawa  tha  major 
approachaa  to  SA  maaouromont  that  hava  bean  davoloped.  Following  thia 
review,  direct iona  for  future  raaoaroh  are  diacuaaod. 

8A  KBASUlumEirC  CRZXCIIZA 

The  two  principle  criteria  by  which  SA  matrioa  ahould  bo 
evaluated  are  their  reliability  and  validity.  Additional  criteria 
auch  aa  aaao  of  uao  and  operator  aocaptanoo  ahould  bo  oonaidarod  only 
whan  ohooaing  between  two  or  more  matrioa  that  are  approximately 
equally  raliablo  and  valid.  Reliability  oonoorna  whether  a  metric 
will  remain  oonaiatant  if  the  aamo  quantity  ia  moaaurod  at  different 
timoa  under  the  aamo  oonditiona.  Validity  mainly  oonoorna  whothar  tha 
metric  actually  meaaurea  what  it  ia  auppoaod  to  moaauro.  Both  arc 
important.  On  ona  hand,  tha  validity  of  a  maaaure  cannot  o.HCood  ita 
reliability.  On  the  other,  there  ia  nothing  to  prevent  a  highly 
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reliable  metric  from  being  invalid.  For  example,  measuring  the  length 
of  pilots '  noses  is  likely  to  provide  highly  reliable  but  completely 
invalid  assessments  of  their  skill  in  combat. 


Reliability 

Reliability  theory  revolves  around  the  concept  of  a  true  score, 
defined  as  the  outcome  of  all  the  factors  that  influence  the  attribute 
being  measured.  Concerning  SA,  these  factors  might  include 
characteristics  of  human  operators  such  as  their  natural  intelligence, 
training,  and  experience,  as  wall  as  characteristics  of  the 
environment  such  as  the  availability  and  formatting  of  relevant 
information.  Any  given  masaura,  X,  of  the  attribute  is  than  said  to 
be  the  sum  of  the  true  score,  T,  and  some  random  error  in  the 
measurement,  e.  Thus, 


X  ■  T  +  e. 

The  variability  of  X,  than,  is  the  variability  of  the  sum  (T  e) . 
Assuming  that  7  and  s  are  uncorrelated,  this  variability  can  be  ra> 
expressed  as  the  sum  of  Var(T)  and  Var(e),  dsnoting  the  variabilities 
of  7  and  s,  respectively.  The  reliability,  or  consistency,  of  a 
measure  may  be  defined  as  the  following  ratioi 

Reliability  Var{7)  /  [Var(7)  Var(s)}. 

Reliability  improves  as  variability  due  to  measurement  error  declines. 
Conversely,  any  factor  that  increases  measurement  error  reduces 
reliability  (for  extended  discussions,  see  Allen  and  Yen,  1962; 
Oulliksen,  19S0;  Lord  and  Novick,  1968;  Murphy  and  Davidshofer,  1991). 

Methods  for  Evaluating  Reliability 

Reliability  can  bo  estimated  using  test-retest,  alternate  forms, 
split-half,  and  internal  consistency  methods.  Test-retest  methods 
require  collecting  the  same  measure  from  the  same  people  under  the 
same  conditions  at  different  times.  Assuming  that  the  measured 
attribute  doss  not  change  over  time  and  that  the  first  measurement 
doss  not  influence  the  second,  the  correlation  between  the  two 
measurements  is  a  direct  estimate  of  the  measure's  reliability.  In 
alternate  forms  methods,  two  alternate  versions  of  the  same 
measurement  technique  are  used  on  the  same  people  and  compared. 
Reliability  is  than  estimated  by  the  correlation  between  the  two 
versions.  Split-half  methods  are  appropriate  when  a  measure  is 
aggregated  from  several  response  samples,  referred  to  as  items. 
Essentially,  the  set  of  items  are  divided  in  half  and  the  correlation 
between  the  two  halvas  is  determined.  Internal  consistency  methods 
estimate  reliability  from  the  intaroorrelations  among  all  of  the  items 
contributing  to  a  measure. 
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Inproving  Rttllabillty 


Th«  aaNiait  way  to  improva  tha  rallability  of  a  maaaura  la  to 
inoraaaa  tha  numbar  of  obaatvationa  that  oontributa  to  tha  maaaura. 

If  tha  obaarvationa  ara  addad  togathar  to  form  a  compoaita  aoora,  then 
tha  aum  will  ba  at  laaat  aa  reliable  aa  tha  leaat  reliable 
obaarvation.  Further,,  if  tha  obaarvationa  are  correlated,  than  the 
reliability  of  their  aum  will  inoraaaa  (1)  aa  tha  numbar  of 
obaarvationa  inoreaaaa  and  (2)  aa  tha  corralationa  among  obaarvationa 
are  atrangthanad.  Thua,  a  good  way  to  improve  the  reliability  of  a 
maaaura  ia  to  obtain  a  larger  number  of  correlated  obaarvationa  and 
uaa  their  aum  (or  average)  aa  the  maaaura. 

In  contraat  to  compoaita  acoraa  (auma  or  avaragaa),  profile 
aoorea  decreaaa  in  reliability  aa  tha  corralationa  among  obaarvationa 
incraaaa.  Profile  aoorea  ara  maaauraa  of  how  one  variable  diffara 
from  another.  For  example,  one  might  maaaura  pilota'  awarenaaa  of  tha 
looationa  of  enemy  aircraft,  enemy  aurfaca-to-air  miaailaa,  and  enemy 
tanka,  some  pilota  might  have  good  awarenaaa  for  aircraft  looationa 
but  poor  awarenaaa  for  miaailaa  and  tanka.  Other  pilota  might  have 
poor  awarenaaa  for  aircraft  but  good  awarenaaa  for  miaailaa  and  tanka. 
Thua,  looking  at  SA  profilaa  might  reveal  apacifio  waakneaaaa  in  SA 
for  specific  pilots.  Comparing  profilaa  is  essentially  equivalent  to 
comparing  diffaranoas  between  variables  (e.g.,  SA  for  aircraft  versus 
SA  for  tanks).  If  two  variables  ara  correlated,  then  they  tend  to 
refloat  the  same  true  score.  Thua,  subtracting  ono  from  tha  other 
will  tend  to  leave  only  the  random  error.  As  a  result,  differences 
between  correlated  variables  will  tend  to  be  highly  unreliable.  In 
general,  then,  profile  scores  should  bo  avoided.  When  possible, 
compoaita  scores  should  be  used  instead. 

Validity 

Validity  is  not  a  simple  concept.  At  least  throe  types  of 
validity  may  be  identifiedi  construct,  content,  and  criterion. 

Construct  Validity 

A  construct  is  some  unobservable  psychological  attribute  such  as 
situation  awareness  that  ia  hypothesiaed  to  account  for  some  aspect  of 
human  behavior.  Construct  validity  refers  to  the  degree  that  a 
moasuro  can  quantify  this  unobservable  paychological  attribute. 
Assessing  construct  validity  involves  identifying  (1)  human  behaviors 
that  are  logically  related  to  the  construct  in  question,  (2)  other 
constructs  that  are  either  related  or  unrelated  to  tha  target 
construct,  and  (3)  behaviors  that  ara  logically  related  to  these  new 
constructs  (Murphy  &  Davidahofar,  1991).  One  then  damonatrataa  that 
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behaviors  related  to  the  construct  (a)  behave  as  they  are  supposed  to, 
(b)  associate  with  other  related  behaviors,  and  (c)  dissociate  from 
behaviors  unrelated  to  the  construct.  Because  statements  of  the 
relationship  between  specific  behaviors  and  a  given  construct  are 
theoretical  in  nature,  tests  of  construct  validity  may  also  be  viewed 
as  teats  of  the  underlying  theory.  Consequently,  failures  to 
establish  construct  validity  are  invariably  ambiguous.  Such  failures 
may  mean  that  the  measure  is  invalid,  or  that  the  underlying  theory  is 
incorrect.  If  tests  of  several  alternative  measures  within  the  same 
theoretical  framework  all  fail  to  establish  their  construct  validity, 
then  one  may  conclude  that  the  underlying  theory  is  at  least  not  very 
useful. 

Three  criteria  are  proposed  in  order  to  establish  the  construct 
validity  of  an  SA  measure.  First,  the  measure  should  avoid  confusing 
momentary  with  reflective  knowledge  of  the  situation.  Second,  the 
measure  should  show  that  SA  declines  when  attention  is  spread  across  a 
larger  or  more  complex  situation.  Third,  the  measure  should  be 
related  to  measures  of  mental  effort  such  that,  if  situation 
assessmant  becomes  more  difficult,  then  SA  declines,  mental  effort 
increases,  or  both.  Each  criterion  is  discussed  in  turn. 

/fofflsntary  versus  re/iective  SA.  The  distinction  between 
momentary  and  reflective  SA  is  similar  to  the  distinction  between 
battlefield  and  armchair  generals.  Battlefield  generals  must  assess 
what  is  actually  happening  whereas  armchair  generals  need  only  assess 
what  is  likely  to  happen.  Of  course,  assessments  made  in  the  comfort 
of  an  office  or  living  room  with  plenty  of  time  to  reflect  upon  them 
may  be  accurate  and  insightful,  but  they  may  also  be  quite  different 
from  the  assessments  which  the  same  individual  might  make  under  the 
pressure  of  the  battlefield.  As  will  be  seen,  some  methods  for 
measuring  SA  may  not  distinguish  wall  between  these  two  kinds  of 
assessments.  Vet  making  the  distinction  is  important.  Individuals 
who  can  develop  accurate  reflective  SA  but  not  good  momentary  SA  will 
make  poor  battlefield  commanders.  In  similar  fashion,  military 
information  systems  that  poorly  support  momentary  SA  may  appear  better 
than  they  are  if  metrice  used  to  evaluate  them  actually  measure 
reflective  SA. 

Attention  and  SA.  Logically,  operators  cannot  know  the  state  of 
a  situational  variable  until  they  have  attended  to  it.  For  example, 
pilots  cannot  know  whether  there  is  an  enemy  aircraft  at  a  certain 
location  unless  they  aim  their  radar  at  that  location  or  attend  to 
some  other  source  of  information  such  as  provided  by  a  ground  control 
intercept  officer.  A  useful  metaphor  for  attention  is  that  of  a 
spotlight!  attention  can  be  spread  over  a  larger  or  smaller  area,  but 
increasing  the  area  lowers  the  quality  of  processed  information 
(Eriksen  and  Veh,  1985).  Further,  Downing's  (198B)  experiments 
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implied  that  increasing  the  number  of  objects  within  the  same-size 
attentional  beam  also  reduces  processing  quality.  Thus,  whan  the  area 
to  be  attended  grows  larger,  or  when  the  number  of  variables  to  be 
attended  increases,  operators'  SA  should  decline. 

Effort  tnd  5A.  When  a  task  becomes  more  difficult,  whether 
because  the  load  on  attention  has  increased  or  for  some  other  reason, 
performancs  quality  may  not  decline  if  sufficient  additional  effort  is 
put  forth.  Thus,  if  maintaining  SA  bsoomes  more  difficult,  SA  may  or 
may  not  decline  depending  upon  whether  and  how  much  effort  is 
increased.  Thus,  evaluations  of  the  construct  validity  of  SA  metrics 
should  include  assessments  of  effort,  when  task  difficulty  increases 
but  effort  doss  not,  then  a  valid  measure  of  SA  will  decline;  on  the 
other  hand,  if  effort  does  increase,  then  SA  may  decline  littla  if  at 
all.  Measurement  of  effort— more  commonly  referred  to  as  "mental 
workload"— is  a  fairly  recent  and  controversial  development  in 
psychology  (Gopher  and  Donchin,  1986;  Moray,  1979;  Ogdon,  Levine,  and 
Eisner,  1979;  Wickens,  1984;  Wierwille,  1979;  Williges  and  Wierwille, 
1979).  Further,  the  theoretical  assumptions  underlying  much  workload 
measurement  research  have  recently  come  under  attack  (Fracksr  and 
Wickens,  1989;  Hirst  and  Kalmar,  1987;  Navon,  1984;  Navon  and  Miller, 
1987).  Nevertheless,  several  practical  measures  of  mental  workload 
have  become  available  (Moray,  1988;  O'Donnell  and  Bggomeier,  1986)  and 
are  in  wide  use.  As  a  result,  theoretical  controversies 
notwithstanding,  it  appears  possible  to  evaluate  whether  an  SA  metric 
responds  appropriately  to  increasing  task  difficulty  and  changss  in 
assessed  effort. 


Content  Validity 

Content  validity  refers  to  the  degrof  that  the  knowledge  or 
behaviors  asssssed  by  a  mstric  represent  the  knowledge  or  task  domain 
being  measured.  Assessing  content  validity  usually  involves  analyzing 
the  spsoific  knowledge  or  behaviors  relevant  to  the  domain  and 
rendering  a  judgment  as  to  whether  the  sampled  knowledge  or  behaviors 
are  in  fact  ropresentativa.  In  SA  measurement,  establishing  content 
validity  first  requires  analyzing  a  given  military  task  in  order  to 
determine  what  kinds  of  information  the  operator  needs  to  know.  This 
information,  ones  determined,  can  then  be  compared  to  the  information 
sampled  by  ths  SA  metric.  Content  validity  would  be  considered  high 
if  all  important  kinds  of  information  in  the  domain— and  no  irrelevant 
domains  of  information— are  sampled  by  the  metric. 

Content  validity  is  specific  to  different  mission  domains.  For 
example,  an  SA  metric  having  high  content  validity  for  a  tactical  air 
defense  mission  will  likely  have  low  content  validity  for  a  strategic 
bombing  mission.  Nevertheless,  there  may  bo  a  situationai  structure 
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coiTunon  to  most  miaaions.  Frackar  (1988)  propoaad  that  auch  a 
atructura  might  hava  fiva  lavalat  goaia,  organizational  funotiona, 
prooaaaaa^  and  atataa.  In  thia  rapraaantation,  aituationa  ara  viawad 
aa  aata  of  variablaa  whoaa  atataa  can  changa  ovar  time.  Thaaa  dynamic 
variable  atataa  ara  aaid  to  raault  from  tha  interaction  of  opjpoaing 
forcea  each  directing  thair  oparationa  toward  apacific  goala.  In 
order  to  achieve  thaaa  goala,  aach  forca  haa  organized  itaelf  into 
particular  unite  and  aaaignad  to  each  unit  apacific  funotiona.  Tha 
interactiona  among  unit  funotiona,  referred  to  aa  procaaaaa,  load 
directly  to  tha  momentary  changaa  in  aituation  variable  atataa. 

Frackor'a  (1988)  fiva>leval  aituational  atructura  impliaa  that 
operator  SA  might  differ  acroaa  lavala.  For  example,  oparatpra  might 
be  aware  of  anamy  objactivea  (high-level  SA)  but  uncertain  aa  to  what 
apacific  aotiona  tha  anamy  haa  undertaken  in  order  to  achieve  thoaa 
objactivea  (low-laval  SA) .  Conversely,  operators  might  know  what 
actions  the  anamy  has  undartaken  but  not  know  what  objective  those 
actions  served.  A  content  valid  measure  of  SA,  therefore,  should  not 
only  sample  the  variables  which  compriaa  tha  situation,  but  should 
also  sample  all  fiva  levels  of  tha  situational  structure. 

Criterion  Validity 

Criterion  validity  refers  to  tha  degree  of  correlation  between 
the  metric  and  some  objective  measure  that  could  be  used  to  evaluate 
the  accuracy  of  a  decision  based  upon  the  metric.  For  example,  if  the 
SA  metric  is  to  be  used  to  select  one  of  several  competing  cockpit 
designs  for  a  new  fighter  aircraft,  the  criterion  might  be  success  in 
combat . 

Batablishing  criterion  validity  is  usually  complicated  by  the 
fact  that  many  factors  may  contribute  to  the  criterion  measure. 

Crmbat  sueeeaa,  for  instance,  depends  not  only  upon  accurate  SA  but 
also  upon  wise  decision  making  and  effective  response  execution. 

While  wise  decisions  and  effective  responses  are  dependent  upon 
accurate  SA,  possessing  the  latter  is  no  guarantee  that  the  others 
will  follow.  Thus,  an  otherwise  valid  measure  of  SA  might  appear  poor 
if  it  is  bested  on  operators  who  make  poor  decisions  or  unskilled 
responses.  This  observation  implies  a  dilemma  in  establishing  the 
criterion  validity  of  SA  metrics.  If  inexperienced  or  only  partially 
trained  operators  are  included  in  the  study,  the  correlation  between 
measured  SA  and  tha  criterion  may  appear  low  for  reasons  that  hava 
nothing  to  do  with  the  SA  metric  itself.  On  the  other  hand,  if  only 
experienced  and  highly  trained  operators  are  included,  a  high 
correlation  may  be  precluded  for  purely  statistical  reasons 
(restriction  of  range).  Paradoxically,  then,  criterion  validity— 
which  is  often  the  most  important  form  of  validity  to  the  user— may  be 
the  most  difficult  to  establish  and  hence  the  least  likely  to  be 
assessed. 
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REVIEW  or  8A  MEAEURmiEIIT  METHODS 


Thro*  major  approaohoa  to  aaaoBaing  aituation  awaranaaa  ara 
raviawadi  axplioit^  implioit/  and  aubjoctiva  rating.  Tha  diatinction 
batwaan  axplicit  and  implicit  maaauraa  eomaa  from  a  diatinction  mada 
by  aoma  payohologiata  batwaan  axplicit  and  implicit  forma  of  mamory 
(aaa  Roadigar,  1990,  for  a  raadabla  diacuaaion).  Explicit  maaauraa 
raquira  paopla  to  aalf-raport  matarial  in  mamory  of  which  thay  ara 
conacioualy  awara.  Aa  a  raault,  auch  maaauraa  ara  oonaidarad 
aubjactive  in  natura — but  ara  diatinguiahad  from  aubjactiva  rating 
maaauraa,  which  involve  aaaignmant  of  numarioal  valuaa  to  tha  guaiity 
of  tha  content  of  awaranaaa.  Unlike  axplicit  maaauraa,  implicit 
maaauraa  do  nc  roly  on  aolf-roporta  of  awaronaas;  rather,  auch 
maaauraa  are  d.^rivad  from  taak  performance.  Specifically,  SA  ia 
inferred  from  tha  influence  of  prior  avanta  on  taak  performance  (a.g., 
evading  an  attacking  aircraft,  locking  on  to  an  enemy  target).  Thua, 
implicit  maaauraa  may  bo  conaidarad  objective  rather  than  aubjactiva 
in  natura. 

In  reviewing  oach  type  of  metric,  tha  moaauromant  methodology  ia 
firat  doaoribad.  Than  any  avidanco  pertaining  to  the  reliability  and 
validity  of  the  raaulting  maaauraa  ia  reviewed. 

Explicit  NaaBuraa 

If  SA  ia  regarded  as  the  information  immediately  available  in 
conacioua  awaronoae,  than  explicit  maaauraa  era  tha  moat  direct  way  of 
aaaaaaing  SA.  Two  typoa  of  explicit  maaauramont  mathoda  can  be 
identified!  ratroapective  event  recall  and  concurrent  mamory  probaa. 

Retroapaotiva  NoaauraB 

Kibby  (1988)  and  Whitaker  and  Klein  (1988)  both  uaod 
ratroapective  event  rocall  to  aaaoaa  SA.  Xibbo  had  laboratory 
Bubjoota  perform  a  radar  warning  rooeiver  (RWR)  monitoring  taak  alone 
or  with  a  concurrent  pursuit  tracking  taak.  During  tha  taak,  five 
different  typaa  of  thraata  appeared  on  tha  RWR  several  times. 

Following  the  task,  aubjecta  wore  asked  to  recall  and  position  throat 
events  along  a  timeline  roprasonting  their  flight  path.  Xn  addition, 
subjects  wore  asked  to  estimate  the  number  of  times  each  type  of 
throat  had  occurred.  Kibbe  found  that  tiroalino  recall  and  placement 
accuracy  dapondod  on  the  type  of  threat!  the  more  sovora  tha  throat, 
the  more  accurate  ita  recall.  However,  accuracy  was  not  affected  by 
whether  the  concurrent  tracking  taak  waa  performed.  Preaonoo  or 
abaenoo  of  the  tracking  taak  did  affect  the  ostimato  of  throat  type 
frequency,  however!  in  the  dual-taak  condition,  subjocta 
undoreatimatod  the  number  of  thraata;  in  the  single-taak  condition, 
aubjecta  overestimated  throat  frequency. 
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Whitakar  and  Klain  (1988;  saa  alao  Klain,  Caldarwood,  and 
Olinton-Ciroeeo/  1985)  took  a  quita  diffarant  approach  to 
ratroapactiva  SA  maaauramant,  uaing  what  thay  callad  tha  "Critical 
Daciaion  Mathod. "  Baaad  on  rianagan’a  (1954)  critical  ineidant 
taohniqua,  aubjaota  wara  aakad  to  raeall  thair  atap-by-atap  daciaiona 
during  a  complax  raal-world  taak  auch  aa  planning  a  military 
oparation.  Applying  protocol  analyaia  taohniquaa>  Whitakar  and  Klain 
mada  a  aignificant  obaarvationi  aubjaota  aaamad  to  uaa  only 
immadiataly  availabla  information.  Ragardlaaa  of  ita  importance  to 
taak  auccaaaf  information  that  raquirad  mora  than  a  ouraory  anarch  wan 
not  nought. 

Aaiiabiiity.  No  reliability  atudiaa  of  ratroapactiva  event 
recall  ara  known  to  have  baan  oonductad.  Kibba’a  (1988)  timalina 
recall  method  could  ba  raliabla  to  tha  axtant  that  arrora  in  recall 
ara  averaged  over  time  and  avanta,  howavar.  Ragarding  Whitakar  and 
lain' a  (1968)  Critical  Daciaion  Method,  proprietary  acoring  and 
analyaia  procaduraa  prohibit  an  aaaaaamant  of  the  likelihood  that  the 
mathod  could  ba  rali.abla. 

Co/iatruct  vMlidity,  Tha  moat  sarioua  challenge  to  tha  conatruot 
validity  of  ratroapactiva  SA  maaauramant  in  ita  inability  to 
diatinguiah  between  momentary  and  raflactiva  SA.  A  greying  body  of 
ranaarch  shown  that  as  tha  time  batwaan  an  avant  and  ita  raoall 
inoraaaaa,  paopla  baooma  moca  likaly  to  recall  "facta"  about  the  avant 
that  in  fact  ara  not  true  (Leftus,  1979;  Loftus  and  Loftua,  1980). 
These  false  recollections  appaar  to  bo  otherwise  reasonable  inforanoaa 
drawn  from  information  that  people  ara  still  able  to  ramambar  (Carr, 
1986).  Because  progressively  mora  information  is  forgotten  as  time 
goaa  on,  such  falsa  infarancaa  increase  in  frequency  as  the  avant 
bacomea  more  distant.  Thus,  ratroapactiva  recall  seems  as  likely  to 
maaaura  what  operators  can  infer  happened  (raflactiva  SA)  as  what  thay 
can  actually  remember  having  happened  (momentary  SA) . 

Besides  confounding  momentary  with  raflactiva  SA,  ratroapactiva 
raoall  may  also  fail  to  decline  aa  the  load  on  attention  inoreasaa. 
in  Kibba'a  (1988)  experiment,  adding  tracking  to  the  RWR  monitoring 
task  should  have  diverted  attention  away  from  tha  monitoring  taak 
thereby  degrading  the  quality  of  SA,  but  adding  tha  tracking  taak  had 
no  affect  on  timeline  plaoament  accuracy.  At  least  throe  explanations 
for  this  failure  are  possible.  First,  the  failure  could  have  rasultad 
from  forgetting!  single-task  SA  may  in  fact  have  been  mora  accurate 
while  tha  monitoring  task  was  performed,  but  the  mora  accurate 
information  may  have  been  forgotten  by  the  time  of  recall.  Second, 
Kibbe's  subjects  may  have  allocated  only  residual  attention  to  the 
tracking  task  thereby  producing  no  change  in  the  amount  of  attention 
allocated  to  tha  monitoring  task.  Unfortunately,  this  possibility 
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cannot  b«  avaluatad  bacaua*  Klbba  did  not  obtain  a  aingla  taak 
baaalina  for  tha  tracking  taak.  Third*  aubjaota  may  hava  oompanaatad 
for  tha  inoraaaad  difficulty  of  tha  taak  by  anarting  mora  affort. 

Kibba  did  not  maaaura  mantal  workload*  howavar*  ao  thia  poaaibility 
cannot  ba  avaluatad  aithar. 

In  apita  of  tha  foragoing  ambiguity*  it  ia  atill  poaaibla  that 
attantion  may  play  a  rola  in  ratroapaotiva  raoalL  bacauaa  information 
that  attraota  mora  attantion  ia  mora  likaly  to  raoallad  latar 
(Logan*  1988;  Hyar  and  Srull*  1986).  Thua*  tha  fact  that  Kibba'a 
(1988)  aubjaota  ramambarad  high  thraat  but  not  low  threat  avanta  may 
indicata  that  tha  former  racaivad  mora  attantion  than  tha  lattar. 

Content  validity.  Ratroapaotiva  taehniquaa  can  aohiava  a  dagraa 
of  contant  validity  dapanding  upon  how  wall  thay  ara  atruoturad. 
Kibba'a  (1988)  tima-lina  plaoamant  tachniqua  aaama  abla  to  maaaura 
oparator'a  raoall  of  how  variabla  atataa  changed  ovar  tima  and  ao  may 
aampla  both  atata  and  prooaaa  awaranaaa.  Khitakar  and  Klain'a  (1988) 
Critical  Daoiaion  Method  may  alao  aampla  highar  lavala  of  8A  if 
oparator'a  giva  their  rationale  for  doing  what  they  did. 

Novarthalaaa*  both  taohniquaa  aaam  to  raly  on  oparatora'  apontanaoua 
raoall  of  information  in  order  to  aampla  the  relevant  information 
domain;  in  a  aonaa*  than*  thaaa  taohniquaa  laava  oontant  validity  up 
to  tha  oparato.'.'. 

Criterion  validity.  In  Kibba'a  (1988)  experiment*  a  meaningful 
criterion  waa  aubjaota'  apeed  and  aoouraoy  in  deteotlng  and 
identifying  thraata  aa  thay  appeared  on  tha  RWR.  Unfortunately*  aha 
did  not  report  corralationa  between  tima-lina  plaoamant  aoouraoy  and 
tha  criterion.  Neverthalaaa*  a  poor  oorralation  may  bo  likely  beoauaa 
apaad  and  accuracy  on  tha  detaction-idantifioation  task  ware  affected 
by  thraat  type  wharaae  plaoemant  aoouraoy  waa  not.  Whitaker  and  Klein 
(1988)  did  not  report  any  criterion  maaauras. 

Concurrant  Naaeuraa 

The  moat  aignifioant  objaotion  to  ratroapaotiva  maaauraa  ia  tha 
confounding  of  momentary  and  raflactiva  SA.  Ao  dioouaaod*  one  raaaon 
for  thia  confound  ia  the  temporal  delay  between  avanta  and  their 
recall,  one  aolution  to  thia  problem  ia  to  probe  memory  oloaer  to  the 
tima  opaoifio  avanta  actually  ooour— during  the  mioaion  rather  than 
aftarwarda.  Savaral  implamcntatiene  of  auoh  concurrant  memory  probaa 
have  appeared  in  the  recant  literature  (Xndilay*  1989;  Fraokar*  1991; 
Frackor  and  Davia,  1990;  Marahak*  Kuperman*  Ramaay*  and  Wilaon*  1987; 
Vanturino  and  Kunaa*  1989;  walla*  Vanturino*  and  Oagood*  1988).  The 
baaio  idea  in  moat  of  thaaa  implamentationa  ia  to  freeae  a  aimulatad 
miaaion  after  aoma  random  interval  of  tima,  blank  tha  pilot 'a 
diaplaya,  and  aak  tha  pilota  to  recall  certain  itama  of  information* 
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such  as  the  locations  of  snamy  aircraft.  8K  is  than  quantifiad  as  the 
pilot's  error  in  responding  to  these  queries. 


Aeiiahiiity.  Fortnal  studies  of  memory  probe  reliability  have  not 
been  encouraging.  Fraoker  (1991)  evaluated  the  test-retest 
reliability  of  mamory  probes  administered  on  eonsaoutiva  days  to  the 
same  subjects  under  identical  experimental  conditions.  In  the 
experiment,  non-pilot  subjects  performed  a  simulated  oombat-lika  task 
in  which  they  monitored  the  positions  of  friendly,  enemy,  and  neutral 
aircraft  displayed  on  a  computer  screen.  Periodically,  the  simulation 
was  frosen  and  one  of  the  aircraft  disappeared.  Subjects  ware  either 
to  show  where  the  aircraft  had  been  located  (location  probe)  or  to 
indicate  its  identity  as  friend,  foe,  or  neutral  (FFN  probe).  Table  1 
shows  the  reliability  (and  validity)  coefficients  averaged  across 
axperimantal  conditions;  tests  of  statistical  significance  followed 
Dunlap,  Silver,  and  Bittner's  (1986)  recommendations.  Location  probes 
appeared  highly  unreliable  while  FFN  probes  fared  somewhat  better, 
although  their  reliability  was  still  not  impressive.  Fracker 
attributed  the  generally  poor  reliability  coefficients  to 
idiosyncratic  practice  effects  between  sessions.  Regarding  location 
probes,  Fraoker  suggested  that  location  error  might  have  been  measured 
with  more  pracisicn  than  was  psychologically  meaningful.  Perhaps  a 
more  appropriate  level  of  precision  would  have  produced  better 
reliability. 

Table  1.  Reiiabiiity  and  validity  coefficients  from  fracker  (1991), 

Probability  of  fisAer's  k  (N  •  32)  in  parentAsses. 

« 


Location  Probe 

FFN 

Probe 

Envelope 

Error 

Accuracy 

Latency 

Sensitivity 

Reliability 

.13 

.49 

.54 

.42 

(ns) 

(.01) 

(.005) 

(.025) 

Correlation  w/ 

Avoidance 

.10 

-.11 

.20 

-.39 

Failures 

(ns) 

(ns) 

(.10) 

(.025) 

Kill 

.02 

.10 

-.29 

Probability 

(ns) 

(ns) 

(.05) 

In  spite  of  the  poor  test-retest  correlations,  other  evidence 
implied  that  reliability  might  be  better  than  indicated.  Fracker 's 
(1991)  two  experiments  manipulated  some  of  the  same  factors  and 
observed  a  high  degree  of  consistency  in  the  memory  probe  data  for 
each  experimental  condition  across  the  two  experiments.  While  this 
consistency  across  experiments  does  not  formally  demonstrate 
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raliability,  it  do«i  suggsHt  that  furthar  raaaaroh  to  datermino  tha 
raliability  of  mamory  probaa  may  ba  juatifiad. 

Construct  vaiicTity.  Although  memory  probaa  may  ba  leas  likely 
than  ratroapeotiva  recall  to  confound  momentary  with  raflaotiva  BA, 
there  are  auggeationa  that  tha  two  may  atill  bo  confounded  to  coma 
degroo.  Baaio  laboratory  raaaaroh  haa  ahown  that  information  atored 
in  working  mamory  daoaya  in  only  a  matter  of  aooonds  without  active 
rohearaal  (Poteraon  and  Pateraon#  19S9).  Thua,  there  haa  boon  oonoorn 
that  pilot  SA  might  decay  during  the  mamory  probe  frooaoa, 
particularly  SA  for  information  that  ia  probed  later  rather  than 
earlier  during  tha  froeae.  In  order  to  reapond  to  probaa  later  in  the 
freoaoi  oparatora  might  than  have  to  rely  on  reflective  SA.  To  aaaeaa 
the  degree  of  working  mamory  daoayi  Bndaloy  (1989)  had  axparionoed 
fighter  pilots  fly  simulated  combat  miaaiona  in  two  experiments.  In 
tha  first,  aha  manipulated  froaao  duration  and  found  virtually  no 
increase  in  error  even  after  delays  of  six  minutes.  In  tha  second 
experiment,  aha  reasoned  that  memory  decay  during  the  fraaaa  would 
inter fare  with  pilots  being  able  to  resume  the  mission  following  the 
fraase.  Thus,  aha  varied  tha  number  of  fraaaos  from  0  to  3  and 
studied  tha  effect  on  mission  parformanoa  (kills  and  loasaa).  She 
reported  that  the  number  of  fraeaea  had  no  effect  on  tha  performance 
measures. 

The  divergence  of  Bndsley's  (1989)  data  from  well-established 
laboratory  findings  demands  explanation.  Bndaloy  acknowledged  that 
her  measures  may  not  have  boon  sufficiently  sensitive  to  detect  decay 
effects,  but  she  fait  the  most  likely  explanation  lay  in  differences 
between  her  experiments  and  traditional  laboratory  tasks.  Whereas 
baaio  laboratory  experiments  have  typically  studied  retention  of 
disorganized  stimuli  such  as  random  sequences  of  letters  or  digits, 
Endslay's  experiments  involved  retention  of  inherently  meaningful 
tactical  information  by  expert  combat  pilots.  She  hypothesized  that 
her  pilots  relied  on  information  atored  in  long-term  memory  in  order 
to  respond  to  the  probes  and  than  to  resume  the  mission.  While  this 
hypothesis  is  consistent  with  most  cognitive  models  of  how  pilots 
develop  and  maintain  their  8A  (Bndsley,  1988)  Franker,  1988/  see 
Ericsson  and  Staszewski,  1989,  for  a  different  cognitive  approach),  it 
also  may  render  memory  probe  data  ambiguous  with  regard  to  whether 
pilots  ware  actually  aware  of  the  probed  information  prior  to  the 
probe  (momentary  SA) .  Conceivably,  pilots  may  not  have  been  aware  of 
the  information  prior  to  the  probe;  rather,  the  probe  may  have  served 
as  the  stimulus  for  an  inference  from  knowledge  gained  through 
previous  experience.  In  short,  the  probe  may  have  measured  the 
quality  of  pilots'  reflective  rather  than  momentary  SA. 

A  related  difficulty  is  that  the  probe  procedure  may  altar 
pilots'  SA.  In  effect,  tha  probe  conveys  a  massage  to  attend  to  a 


■pacific  itam  of  information  in  tha  futura— and  impliaa  a  panalty  .'or 
not  attanding.  Aa  a  raault/  pilots  might  attand  to  information  that 
otharwlaa  might  hava  baan  ignorad.  Probaa  might  thua  ahapa  pilota'  SA 
rathar  than  juat  maaaura  It.  Thia  problam  oan  probably  ba  avoidad  by 
anding  tha  miaaion  aftar  tha  firat  fraata  and  navar  uaing  tha  aama 
■ubjaot  again.  Such  a  solution  may  bt  Impraotioal,  howavar;  tha 
numbar  of  pilota  availabla  in  SA  raabaroh  ia  uaually  amall^  and  tha 
naad  for  larga  amounts  of  data  is  uaually  graat. 

Unlika  working  mamory  daoay  affaots,  attantion  affaots  hava  baan 
mora  aupportiva  of  mamory  proba  construct  validity.  Fraokar  (1991) 
manipulatad  combat  intensity  by  incraasing  tha  numbar  of  thraataning 
aircraft  in  tha  simulation.  Aa  notad  oarliar,  thia  manipulation  lad 
to  a  dacraaaa  in  SA  as  maasurad  by  location  arror  and  FFN  accuracy. 

If  an  inoraasa  in  anamy  numbar  oan  ba  viawad  aa  an  inoraaaa  in  .tha 
load  placed  on  attantioni  than  tha  relation  batwaan  probed  SA  and 
attantion  appaara  to  ba  confirmed.  Other  aspacta  of  Fraokar 'a  data 
ware  not  entirely  oonsiatant  with  thia  conclusion^  however.  For 
oxamplo»  in  two  of  tha  axparimanta,  aubjaota  aomatimaa  had  to  monitor 
an  additional  info.rmation  display,  but  tha  praaonoe  or  abaonoa  of  thia 
additional  monitoring  task  had  no  affect  on  tha  mamory  proba  maaauraa. 
At  present,  tha  raaaona  for  thia  result  are  not  known. 

With  respaot  to  effort  and  SA,  tha  oonatruot  validity  of  mamory 
probaa  ia  not  clear.  Fraokar  (1991)  found  that,  aoroaa  exparimantal 
conditions,  poorer  probad  SA  (i.a.,  inoraasad  location  error, 
daoraasad  FFN  accuracy)  was  accompanied  by  inoraasad  failures  to  avoid 
ground  threats.  One  possible  explanation  is  that  effort  was  divartad 
from  tha  avoidance  task  to  SA  maintenanoa  as  maintaining  SA  became 
more  difficult.  If  this  interpretation  ware  correct,  than  one  might 
expect  that  probad  SA  and  avoidance  failures  would  be  correlated 
within  experimental  conditions  aa  wall,  but  tha  average  correlation 
was  small  (aaa  Table  1).  However,  a  strong  correlation  might  not  ba 
expaotad  if  increased  allocation  of  effort  to  SA  maintenance  prevented 
SA  from  declining.  Further,  tha  correlation  might  also  ba  limited  by 
poor  reliability  of  both  memory  probes  and  avoidance  failures i 
reliability  of  tha  latter  was  poor  (r  •  .26). 

Content  validity.  Lika  retrospective  measures,  concurrent  memory 
probes  oan  achieve  a  degree  of  content  validity  depending  upon  how 
they  ara  atruoturad.  Indslay'a  (19S9)  work  developing  SAOAT 
(Situation  Awarenasa  Olobal  Awareness  Technique),  a  sophisticated 
implementation  of  memory  probes  for  use  in  high-fidelity  flight 
simulations,  has  foousad  on  aohiovin:;  a  high  dagraa  of  content 
validity  for  specific  military  mi  a  lions.  Navsrthalaas,  while  mamory 
probas  ara  particularly  useful  for  sampling  the  momentary  states  of 
various  situational  variables,  it  is  not  clear  t  ow  they  oan  ba  used  to 
sample  higher  levels  of  8A  such  as  goal  or  organisation  awartness. 


12 


Bndslay  (1989)  has  auggaatad  that  pilota  can  ba  aakad  to  indicata  tha 
futura  rathar  than  currant  atataa  of  aituational  variablaa,  but  auch 
raaponaaa  may  indicata  littla  bayond  pilota*  undaratanding  of  tha 
immadiata  prooaaaaa  controlling  momantary  atataa.  Thua^  whila  tha 
content  validity  with  raapact  to  momantary  atataa  and^  parhapa, 
prooaaaaa  may  ba  high,  memory  probac  may  poaaaaa  littla  potantial  for 
content  validity  at  higher  lavala. 

Criterion  validity.  Of  thoaa  atudiaa  evaluating  memory  probaa, 
only  Fraoker  (1991)  appaara  to  have  compared  probed  SA  to  a  criterion 
maaaura  of  aucoaaaful  miaaion  performance.  In  Prackar'a  axparimenta, 
aubjacta  controlled  an  icon  rapraeanting  a  friendly  aircraft  and  uaad 
it  to  attack  and  daatroy  anamy  aircraft.  A  raaaonabla  maaaura  of 
miaaion  auooaaa,  than,  ia  tha  probability  of  a  kill  given  an 
engagement  with  tha  anamy.  Tha  within-condition  correlation  batwaan  . 
probed  8A  and  kill  probability  waa  aaaantially  aero  for  both  location 
error  and  FFN  accuracy  but  waa  atatiatically  aignifioant  for  FFN  probe 
latency  (aaa  Tabla  1).  Again,  tha  poor  reliability  of  probed  SA  may 
account  for  thaaa  poor  corralationa.  (Kill  probability  produced  a 
taat-rataat  reliability  coefficient  of  .48). 

Implicit  and  Surrogate  Naaauraa 

Bxplioit  maaauraa  of  SA  clearly  have  liabilitiaat  both  their 
reliability  and  conatruot  validity  are  in  question.  Farhapa  for  this 
reason,  soma  rasaarchars  have  focused  on  developing  implicit  maaaures 
(Bubanks  and  Killeen,  1963;  Fraoker,  1991;  Vanturino,  Hamilton,  and 
Dvorchak,  1969).  In  implicit  measurement,  tha  goal  is  to  datarmina 
whether  pilots'  miaaion  parformanoa  has  bean  influanoad  appropriately 
by  tha  ooourranca  of  specif io  avants.  Tha  most  straightforward 
approach  uses  signal  detection  theory  to  derive  an  SA  metric  (Bubanks 
and  Killeen,  1963;  Fraoker,  1991).  In  addition,  surrogate  measures 
have  bean  proposed  which  do  not  directly  aaaass  tha  impact  of  events 
on  performance  but  still  attempt  to  use  performance  as  an  index  of  SA 
(Vanturino  at  al,  1989). 

Signal  Dataetion  Thaoratio  (SDZ)  Measures 

Suppose  that  event  X  occurs.  If  pilots  are  aware  of  the  event's 
occurrence,  than  they  should  raapond  in  one  way  (a  "hit");  but  if 
pilots  are  unaware  that  the  event  occurred,  then  they  should  respond 
in  a  clearly  different  way  ("miss").  Unfortunately,  the 
interpretation  of  hits  and  misses  is  always  complicated  by  response 
bias.  For  example,  pilots  may  be  biased  to  attack  other  aircraft  when 
they  are  unsure  whether  the  aircraft  is  friend  or  foe.  In  order  to 
identify  and  correct  for  such  bias,  it  is  necessary  to  also  measure 
false  alarms  (responding  as  if  the  event  occurred  when  it  did  not)  and 
correct  rejections  (not  responding  when  the  event  did  not  occur). 
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Onoa  thaaa  four  typaa  of  raaponaaa  hava  baan  idantifiad  and  oountad 
ovar  tha  ooursa  of  a  miaalon,  thara  ara  aavaral  mathoda  availabla  for 
oomputina  tha  pilota'  ability  to  diaoriminata  eeourranoa  from  non- 
ooourranea  of  tha  targat  avant,  rafarrad  to  aa  aanaitivity  (Maomillan 
and  Craalman^  1990) .  Baeauaa  aanaitivity  daolinaa  if  pildta  ara 
unawara  of  avanta  occurring  and  incTaaaaa  if  thay  ara  ao  awara,  tha 
maaaura  providaa  both  an  ampirioal  and  an  intuitivaly  raaaonabla 
maaaura  of  awaranaaa  for  a  particular  kind  of  avant  (of.,  Hawkina, 
1990). 

Any  diaorata  maaaura  of  parformanoa  can  ba  uaad  to  maaaura 
aanaitivity  providing  that  tha  following  thraa  oenditiona  can  ba 
aatiafiad.  Firat,  targat  avanta  aa  wall  aa  tha  raaponaaa  to  bo 
oountad  ao  hito  muat  ba  unambiguoualy  dafinod  ao  that  tha  praaonoa  and 
abaanoa  of  both  ara  oloar  and  countablo.  Nota  that  oontinuoua 
maaauraa  (a.g.,  valecity,  altituda)  oan  bo  uaad  if  particular  ohangaa 
in  tha  maaauraa  oan  bo  dafinod  aa  avanta  or  raaponaaa  (o.g.,  a 
auffieiontly  largo  daoraaaa  in  valooity  or  altituda).  Saoond,  whan 
moro  than  ona  hit  raaponao  ia  poaaibla  oontingant  upon  which  of 
aavaral  altarnativa  forma  of  an  avant  oooura,  tha  aotn  of  avanta  and 
raaponaaa  muat  both  bo  finita.  Third,  oaoh  altarnativa  form  of  an 
avant  muat  call  for  oxaotly  ona  roaponaa,  and  that  raaponao  muat  ba 
uniqun  to  that  altarnativa. 

In  moating  tha  foragoing  thraa  eonditiona,  tha  main  challonga  may 
ofton  bo  to  find  raaponao  maaauraa  that  roaot  to  tha  avanta  of 
intaraat.  Fortunataly,  for  aoma  kinda  of  avanta,  appropriata  maaauraa 
ara  not  hard  to  find.  Both  kubanka  and  Killoon  (1963)  and  Praokor 
(1991)  waro  intoraatad  in  whathar  aubjoota  would  dataot  tha  antry  of 
onomy  targata  into  tha  subjaota'  waapon  anvalopa.  Kubanka  and  Xilloon 
atudiad  tha  parformanoa  of  Air  Foroa  F-4X  pilota  in  aimulatod  air-to- 
air  combat.  Hita,  miaaaa,  falaa  alarma,  and  corract  rajaetiona  wore 
dafinod  in  torma  of  whathar  or  not  thara  waa  an  onamy  in  tha  anvalopa, 
and  whathar  or  not  pilota  firad  tha  waapon. 

Rmlitbility.  Fraokar  (1991)  roportad  that  tha  taat-rataat 
roliability  for  anvalopa  aanaitivity  waa  aimilar  to  that  for  FPN 
proboa  (aoo  Tabla  1). 

Conacruot  validity.  A  major  advantage  of  anvalopa  aanaitivity 
ovar  explicit  moaauroa  ia  that  thara  ia  little  opportunity  for 
momentary  SA  to  bo  confounded  with  raflootiva  8At  if  anvalopa 
aanaitivity  maaauraa  8A  at  all,  it  ia  clearly  momentary  8A  that  ia 
maaaurod.  Navarthalaaa,  anvalopa  aanaitivity  may  confound  momontary 
8A  with  other  proooaaoa  that  intorvona  batwfan  SA  formation  and 
miaaion  auocoaa;  auoh  proooaaoa  may  include  raaponao  aaleotion 
(daoiaion  making)  and  raaponaa  exooution.  Thia  poaaibility  may  baooma 
more  likely  aa  the  raaponao  uaad  to  define  a  "hit"  booomea  moro 
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oomplax,  raquiring  graatar  knowladga  or  akill  on  tha  part  of  tha 
oparator.  Such  unwantad  inoidantal  af facts  may  halp  to  explain  why 
soma  atudias  hava  found  anvalopa  aanaltlvity  to  bo  a  noisy  maasuro 
insonsitivo  to  important  oxparimantal  manipulations  (Hooldridgo  at 
alw  19B2).  Under  soma  oircumstanoas^  than,  tha  infaranoa  of 
momantary  8A  from  sansitivity  may  ba  invalid  if  situational  factors 
also  affact  other  intarvaning  prooassas,  a  possibility  difficult  to 
rula  out. 

Whathar  or  not  sansitivity  confounds  momentary  8A  with  other 
factors,  Frackar  (1991)  has  found  that  anvalopa  sansitivity  bahavas 
lika  a  maaaura  of  8A  in  at  laast  ona  raspact.  8paoifioally, 
sansitivity  daclinad  as  tha  number  of  anamy  aircraft  in  tha 
simulation— i. a. ,  tha  load  on  the  attantional  spotlight— inoraasad. 

Tha  average  correlation  of  sansitivity  with  avoidance  failures  (a 
measure  related  to  mental  effort)  was  about  as  high  as  ona  might 
axpact  given  thair  raspaotiva  reliabilities  (sea  Table  1).  Consistent 
with  this  correlation,  Eubanks  and  Xillaan  (1983)  found  that  pilots' 
anvalopa  sensitivity  improved  dramatically  with  training.  Assuming 
that  training  dacraasas  tha  amount  of  mental  effort  required  to 
perform  a  task  (of.,  Sohnaider  and  8hiffrin,  1977),  this  result 
suggests  that  sensitivity  improves  as  tha  demand  for  effort  daolinas. 

Confnt  validity.  The  most  serious  challenge  to  the  sensitivity 
metric  may  concern  its  content  validity.  Thera  are  at  laast  three 
practical  problems  that  may  limit  tha  ability  of  tha  sansitivity 
metric  to  sample  tha  whole  content  domain  of  a  mission.  First, 
sensitivity  can  ba  measured  for  only  a  single  kind  of  event.  If  tha 
rasaarchar  is  interested  in  a  variety  of  event  types,  than  each  will 
require  its  own  maasura.  Thus,  the  measure  of  8A  will  bo  a  sot  of 
sensitivity  parameters  rather  than  a  single  parameter.  Second,  there 
may  not  always  exist  a  natural  response  measure  for  events  that  may 
nonetheless  bo  of  interest  (o.g.,  the  pilot's  awareness  of  changes  in 
his  proximity  to  the  ground).  Third,  defining  non-events  so  that 
false  alarms  and  correct  rejections  can  bo  counted  may  present  a 
challongo.  Xn  simulations,  a  simple  solution  is  to  count  the  absence 
of  the  target  event  during  each  program  cycle  an  one  non-event.  In 
non-simulated  environments,  a  simple  solution  may  not  exist  (see 
Hickens,  1984,  for  a  discussion).  Xn  addition  to  those  practical 
limitations,  there  is  also  an  important  theoretical  limitationi  the 
sensitivity  metric  in  based  on  detections  of  changes  in  momantary 
states.  As  a  result,  sensitivity  probably  cannot  bo  used  to  assess 
higher  levels  of  8A  such  as  organisation  or  goal  awareness. 

Criterion  validity,  Studios  of  the  criterion  validity  of  the 
sensitivity  metric  have  not  boon  oonduotod.  Xn  Frackar 'a  (1991) 
experiments,  kill  probability  wan  equivalent  to  the  probability  of  a 
hit  used  to  oaloulato  sensitivity.  As  a  result,  the  obtained  high 
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oorrclation  batwnan  ■•naitivlty  and  kill  probability  waa  both  axpactad 
and  uninformativa. 

lurregata  Naaauraa 

Unlika  SDT  maaauraai  thara  ia  no  aalf-avidant  logical  ralation 
batwaan  aurrogata  roaaauraa  and  8A.  Tha  juatifioation  for  uging  aueh 
naaauraa  ia  puraly  ampirieali  thay  ara  oorralatad  with  an  axiating 
naaaura  of  Bh  alraady  baliavad  to  bo  valid.  Of  eouraar  if  a  validatod 
naaaura  alraady  axiatOi  thara  may  ba  littla  noad  for  anothar.  Still 
ona  might  daaira  a  naaaura  that  ia  aimplar  or  laaa  ooatly  to  obtain 
than  tha  ourrantly  validatad  ona. 

Only  ona  attampt  to  idantify  aurrogata  naaauraa  ia  known  to  hava 
baan  raportod.  Xn  a  oomplax  atudy»  Vanturino,  Hamilton,  and  Dverohak 
(1989)  moaaurad  fira  point  aalaotion  (VPS,  tha  point  ralativa  to  an 
onomy  targat  at  which  pilota  launch  thair  miaaila)  during  aimulatad 
air-to-air  combat.  Tha  rolationahip  of  PPS  to  aubjactiva  aalf-ratinga 
of  IA  by  tha  pilota  waa  oxaminad  and  found  to  ba  both  non-linaar  and 
non-monotonio 4  Aa  a  raault,  corralation  coaffioianta  ware  not 
caloulatad.  Navarthalaaa,  tha  authora  fait  abla  to  ooncluda  that 
"axtrama  or  arratio  FP8  valuaa  may  ba  an  indicator  of  poor  aituation 
awaranaaa**  (p.  4-4). 

Aaliabiiity.  No  reliability  atudiaa  of  PPS  or  any  othar 
potantial  aurrogata  naaauraa  ara  known  to  hava  baan  oonductad. 

validity.  Tha  Aohillaa*  haal  of  aurrogata  maaauraa  ia  tha 
aaaumption  that  ona  poaaaaaaa  a  valid  oritarion  maaaura  to  bagin  with. 
Xn  Vanturino  at  al.'a  (1989)  atudy,  howavar,  tha  aaaumption  ia 
problamatio.  Whila  pilota'  aalf-ratinga  of  thair  own  8A  may  aomatimaa 
ba  valid,  thara  ia  avidanoa  that  auoh  ia  not  alwaya  tha  oaaa 
(diacuaaad  balow) .  Vanturino  at  al.  wera  aware  of  thia  difficulty  and 
did  not  baaa  thair  eonoluaiona  on  8A  ratinga  alone.  Navarthalaaa, 
without  a  valid  SA  oritarion  naaaura,  tha  oonoluaion  that  rP8  maaauraa 
8A  aaama  circular i  OA  ia  inferred  from  tha  maaaura  that  it  ia 
iiuppoaad  to  explain. 


Subjeotiva  Hating  Naaauraa 

Subjaotiva  rating  maaauraa  of  8A  ara  by  far  tha  aaaiaat  to 
oollaot  and  ao  hava  proven  popular  (Arbak,  Bohwarta,  and  Kuparman, 
1987)  Praokar  and  Oavia,  1990;  Saloon  and  Taylor,  1989;  Taylor,  1989; 
Vanturino,  Hamilton,  and  Dvorehak,  1989;  Ward  and  Haaaoun,  1990).  Two 
olaaaaa  of  rating  maaauraa  hava  baan  uaedi  direct  and  oomparativa. 

Xn  direct  ratinga,  pilota  aaaigh  a  numerical  value  to  thair  SA  during 
a  given  raiaaion  (or  miaaion  aagmant).  Whila  pilota  may  make  thaaa 
aaaignmanta  in  light  of  tha  ratinga  given  to  pravioua  miaaiona,  tha 
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rating  taohniqua  doaa  not  inharantly  raqulra  tham  to  do  ao  (although 
thay  may  ba  inatruotad  to  do  ao) .  Zn  any  oaaa,  tha  aaaignad  rating  ia 
aaaumad  to  hava  aoma  diraot«  monotonio  ralation  to  tha  abaoLuta 
magnituda  of  SA  axparianoad  during  tha  miaaion.  Xn  oomparativa 
ratinga,  pilot a  oompara  thair  SA  during  ona  miaaion  to  that  during 
anothar  and  aeaign  a  valua  to  tha  ratio  of  ona  to  tha  othar.  Thua,  in 
oomparativa  ratinga,  no  attampt  ia  mada  to  datarmina  tha  location  of 
raportad  SA  with  raapaet  to  a  fixad  point  on  tha  undarlying  aoala. 
Rathar,  ona  obtaina  ratio  aatimataa  only.  For  axampla,  valuaa  on  an 
underlying  aoala  of  10  and  40  woulci  appear  idantioal  to  valuaa  of  100 
and  400. 

Dlraot  Ratlaga 

Tha  moat  oommon  dlraot  rating  maaouraa  hava  uaad  Li)cart  aoalaa. 
For  axampla,  Ward  and  Haaaoun  (1990)  taatad  pilota*  ability  to  raoovar 
from  unusual  attitudaa  with  thraa  diffarant  types  of  haad-up  display 
pitoh  ladders.  Zmmadiataly  following  a  trial,  pilots  wara  as)cad 
whether  thay  agreed  with  tha  atatamant  "X  axparianoad  no  oonfusion 
with  this  pitoh  ladder  oonfiguration  and  was  easily  able  to  recover  to 
straight  and  level  flight."  Pilots  responded  with  a  number  between  1 
and  9  indicating  their  agraamant  with  tha  atatamant  (1  >  "daoidadly 
disagree,"  9  ■  "daoidadly  agree"). 

While  Ward  and  Haaaoun  (1990)  used  only  one  rating  scale,  moat 
roaaarohars  hava  employed  multiple  aoalaa  on  tha  hypothesis  that  SA  is 
a  multi-dimensional  oonstruot  (Arbak  at  al.,  1987)  Saloon  and  Taylor, 
1989)  Taylor,  1989)  Vanturino  at  al,  1987).  Arbak  at  al.  used  six 
rating  scales  darivad  from  a  definition  of  SA  foouaing  on  various 
olaments  of  air-to-air  combat  (a.g.,  friendly  locations  and  actions, 
enemy  locations  and  actions,  available  options,  and  so  on).  A  similar 
approach  appears  to  hava  bean  used  by  Vanturino  at  al.,  although  thosa 
authors  did  not  identify  the  scales  used.  Taylor  (1989)  Saloon  and 
Taylor,  1989)  rejactad  Arbak  at  al.'s  a  priori  approach  to  scale 
construction,  opting  instead  for  an  empirical  approach.  Beginning 
with  44  possible  SA  dimensions,  Taylor  used  principal  components 
analysis  to  identify  three  major  factors  since  incorporated  into  tha 
Situational  Awareness  Rating  Technique  (SART) t  atta/itionai  demand, 
attantionai  supply,  and  situational  understanding.  Taylor  also 
decomposed  thasa  thraa  factors  into  ten  components,  but  tha  stability 
of  these  components  is  not  currently  known  (of.,  Harmon,  1976). 

Reliability.  No  reliability  studies  of  direct  ratings  of 
subjective  SA  are  known  to  hava  been  conducted. 

Construct  validity.  No  ooharent  theory  currently  exists  either 
of  subjaotiva  SA  or  of  how  subjaotiva  SA  might  ba  mapped  onto  Likart- 
typa  rating  scales.  Consequently,  it  is  difficult  to  assess  just  what 
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it  is  that  aubjaetiv*  SA  ratings  might  actually  maasurst  Ona 
possibility  is  that  subjactiva  8A  ratings  ara  actually  oonfidanea 
ratings;  that  im,  ratings  of  onas  oonfidanea  that  ona  knows  avarything 
that  naads  to  ba  known.  The  uaafulnass  of  such  oonfidanea  ratings 
probably  dapands  upon  how  thay  ara  ralatad  to  momantary  SA,  a 
ralationship  that  has  not  yat  baan  axplorad. 

Taylor  (1989)  has  axplorad  tha  ralationship  batwaan  subjaotiva  8A 
(SART's  situational  undarstanding  aoala)  on  ona  hand  and  attantional 
load  and  affort  on  tha  othar.  Tha  maasuras  of  load  and  affort  wara 
subjaotiva  rathar  than  objaotiva,  howavar  (SART's  attantional  damand 
and  supply  soalas,  raspaotivaly) .  In  ona  axparimant,  subjaotiva  load 
and  affort  wara  positivaly  oorralatad  (r  ■  .60)  but  naithar  was 
oorralatad  with  subjaotiva  SA  (r'a  <  .14).  Thia  result  could  maan 
that  as  attantional  load  inoraasad,  affdrt  may  also  hava  inoraasad  in 
order  to  maintain  SA  at  a  ralativaly  constant  laval.  In  a  second 
axparimant,  subjaotiva  load  was  oorralatad  with  subjaotiva  affort  (r  » 
.53)  but  not  subjactiva  SA  (r  <  .14);  at  tha  same  time,  subjaotiva 
affort  was  oorralatad  with  subjaotiva  SA  (r  ■  .65).  Thasa  results  ara 
also  sansibla;  thay  could  indicate  that  more  affort  was  expanded  than 
actually  naoassary  to  maintain  SA  in  the  faoa  of  inoraasing  load. 
Results  in  both  axparimants  wara  apparently  oonsistent  with  existing 
theories  of  situation  asaassmant  (Bndslsy,  1968;  Fraokar,  1988). 

Thus,  Taylor's  (1969)  rasaareh  suggests  that  8ART  may  indsad  possess 
soma  degree  of  construct  validity. 

Content  vaiicTity.  Content  validity  has  not  always  baan  an 
objective  of  subjaotiva  ratings.  Taylor  (1989;  Saloon  and  Taylor, 
1989)  has  focused  on  establishing  construct  validity  with  little 
affort  to  identify  or  sample  relevant  mission  content  domains.  At 
least  one  resaaroher  has  sought  to  establish  content  validity, 
howavar I  Arbak  at  al.'s  (1987)  six  rating  scalas  wara  a  deliberate 
attampt  to  sample  tho  content  domain  of  air-to-air  combat.  Tha 
contrast  botwsen  Taylor's  and  Arbak  at  al.'s  rasaareh  may  point  to  the 
difficulty  of  dovaloping  rating  soalas  to  establish  both  construct  and 
oontant  validity  simultaneously.  In  principle,  such  scales  could  bo 
developed  by  nesting  oontant-orientod  soalas  within  oonstruct-oriantad 
soalas  (or  tha  othar  way  around).  Although  such  nested  scales  might 
prove  too  complex  in  praotioa,  their  davalopmont  and  evaluation  may  bo 
a  useful  direction  for  future  rasaareh. 

Criterion  validity.  In  spite  of  thair  appeal,  subjaotiva  ratings 
of  SA— whan  taken  alone— confront  a  major  difficulty.  While  such 
maasuras  may  ba  abla  to  assess  subjects'  oonfidanea  in  their  own  SA, 
thara  is  compalling  avidanoa  that  this  oonfidanea  is  poorly  ralatad  to 
maasuras  of  mission  success.  For  example,  venturino  at  al.  (1989) 
reported  that  pilots  who  rated  their  SA  as  high  wore  as  likely  to  hava 
]parformod  well  as  poorly.  An  oven  more  dramatic  case  has  bean 
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reportsd  by  Ward  and  Haasoun  (1990).  Thoaa  authora  found  that  tha  HUD 
pitch  laddar  which  producad  tha  baat  aubjaotiva  8A  ratinga  alao 
producad  tha  graataat  parcahtaga  of  invartad  raeovariaai  pilota 
baliavad  thay  wara  upright  whan  In  fact  thay  wara  upaida  downl  In 
tarma  of  avaluatlng  that  particular  display,  this  outcoma  was  highly 
informativa  bacause  it  ravaalad  that  tha  display  was  not  just 
uninformativa  but  was  in  fact  dangaroualy  mislaading.  Thasa  rasults 
auggast  that  an  objactiva  aasasamant  of  SA  lias  in  tha  inoonsiatancy 
batwaan  aubjaotiva  8A  ratings  and  tha  appropriata  parformanoa  oritaria 
rathar  than  in  tha  ratings  alona.  Quantitativa  asaaasmant  of  this 
inconaistanoy  may  provida  a  uaaful  indax  of  SA  and  may  ba  a  fruitful 
diraotion  for  futura  raaaaroh. 

An  altarnativa  approach  would  ba  to  try  and  ramova  tha 
inoonsiatancy  batwaan  SA  ratings  and  parformanoa  oritaria.  Tha 
inconaistanoy  probably  arisaa  bacausa  pilots  do  not  )cnow  that  thay  ara 
unawara  of  critical  information.  A  prooadura  to  aliminata  tha 
inoonsiatancy  might  ba  to  maka  pilots  awara  of  task  outcoma  bafora 
collaoting  thair  ratings.  If  Ward  and  Hassoun  (1990)  had  first  told 
pilots  whathar  thay  wara  invartad  bafora  collaoting  thair  ratings,  tha 
rasults  would  undoubtadly  hava  baan  quita  diffarant.  Navarthalass, 
tha  "improvad"  rasults  would  hava  baan  daoaptiva  in  anothar  wayi 
whila  tha  ratings  would  hava  ravaalad  tha  poor  8A  assooiatad  with  tha 
troublasoma  display,  thay  would  hava  hiddan  tha  fact  that  tha  display 
was  actually  mislaading  rathar  than  just  uninformativa.  Thus,  what  is 
olaar  is  that  subjaotiva  8A  ratings  should  not  ba  usad  alona  but 
should  ba  oombinad  in  soma  way  with  oritarion  maasuraa  of  parformanoa. 

Coaparativa  Ratings 

Although  diraot  subjaotiva  ratings  may  ssam  to  asssss  tha 
magnituda  of  paroaivad  SA,  such  ratings  gansrally  cannot  ba  oomparad 
across  ratsrs.  A  pilot  who  assigns  his  8A  a  rating  of  "9"  may  moan 
tha  aamo  thing  as  anothar  who  assigns  har  SA  a  rating  of  ”7.'' 
Navarthalass,  if  ono  is  comparing  SA  across  diffarant  missions,  such 
ratings  can  ba  oomparad  within  subjoots  if  individual  subjoots  ara 
oonsistont  in  how  thay  map  paroaivad  SA  onto  tha  rating  soala. 

Whathar  subjoots  ara  in  fact  oonsistont  is  difficult  to  ovaluato 
ampirioally,  howavar.  For  that  raason,  Fraokar  and  Davis  (1990) 
proposod  a  subjaotiva  SA  scaling  taohniqua  which  both  onoouiagos  and 
assossas  oonsistonoy.  Dorivod  from  Vidulioh's  (1989)  Subjaotiva 
Workload  Dominanoa  (SWORD)  taohniqua  (sao  also  Budasou,  Zwiok,  and 
Rapoport,  1986;  Hughos  at  al,  1990;  Lodga,  1981;  Saaty,  1977;  Ward  and 
Hassoun,  1990),  subjoots  first  oxparianoo  savoral  diffarant 
oxparimontal  conditions  and  than  judga  how  much  bottor  SA  in  ono 
condition  was  oomparad  to  anothar,  for  all  possiblo  pairs.  Tha  fact 
that  subjabts  dirootly  oompara  conditions  onoourages  thorn  to  apply  tha 
samo  subjaotiva  soala  to  aaoh  condition,  and  tha  rosulting  two-way 
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matrix  can  ba  axaminad  to  datariaina  tha  axtant  to  which  aubjacta  .wara 
in  .  tact  inoonaiatant . 

Aaiiability.  No  atudiaa  of  SA  oonparativa  rating  .reliability' 
hava  yet  bean  oonduotad. 

Conatruot  vMlidity.  Aa  with  diraot  aubjaotiva  rating  maaauraa, 
it  may  be  that  comparative  ratinga  are  merely  an  alternative  method 
for  aaaeaaing  operatora'  confidence  in  their  SA4  Hevertheleaa, 

Fraoker  and  Davie  <1990}  provided  evidence  that  such  ratinga  mai*  yet 
poaaaaa  a  degree  of  conatruot  validity.  Uaing  the  combat  taak  from 
Fraoker  (1991),  tha  raaearchera  had  aubjeota  perform  under  two  levela 
of  combat  intonaity  (Low,  High)  and  two  levela  ol  difficulty  in 
identifying  objeeta  aa  friend  or  foe  (laay.  Hard) .  In  addition  to 
pairod-oompariaon  ratinga  of  SA,  tracker  and  Davia  alao  collected 
Subjective  Workload  Aaaeanmant  Technique  (SWAT)  ratinga  of  mental 
workload  for  each  of  the  four  experimental  conditiona  (Reid, 
Shingledeoker,  and  Bggemeier,  1981).  SWAT  ratinga  clearly 
diatinguiahed  among  the  four  conditiona  and  ordered  them  from  leant  to 
moat  workload  aa  followai  Low-laay,  Low-Hard,  High-Bany,  High-Hard. 
Subjective  SA  ratinga  failed  to  dintinguiah  between  the  Low-Hard  and 
High-Iaay  conditiona  but  otharwiae  provided  the  jame  ordering  from 
beat  to  pooreat  SA.  (Within  expekimentel  conditiona,  the  ooirelationa 
between  SWAT  and  SA  ratinga  were  virtually  aero.)  Theae  renulta  may 
indicate  that  aubjecta  were  able  to  maintain  their  SA  from  the  Low- 
Hard  to  tha  High-Baay  condition  by  inereaaing  the  amount  of  mental 
effort  expended.  Further,  the  name  pattern  found  in  the  SA  ratinga 
waa  alao  obaarved  in  FFN  accuracy  (the  correlation  between  aubjective 
SA  and  FFN  accuracy  acroaa  experimental  conditiona  waa  not  atrong, 
however t  r  ■  .35). 

Nevarthaleaa,  not  all  of  the  evidence  from  Fracker  and  Davia* 
(1990)  axparifflont  aupported  the  conatruot  validity  of  the  comparative 
ratinga.  Tha  major  difficulty  waa  that  SWAT  ratinga  diaaooiatad  from 
threat  avoidance  failurea  aa  the  experimental  conditiona  incieeaed  in 
difficulty  (in  the  eaaieat  condition,  r  ■  .44;  in  the  moat  difficult, 
r  ■  -.11).  Beoauae  avoidance  failurea  were  alao  a  meaaure  of  mental 
workload,  thia  ayatematic  diaaociation  oomplioatea  the  interpretation 
of  SWAT  aa  a  meaeure  of  mental  effort  (cf.,  Teh  and  Wiokena,  1988}  and 
hence  the  interpretation  of  the  paired-oompat iaon  SA  ratinga.  Like 
direct  ratinga,  then,  it  aeema  prudent  to  avoid  relying  on  comparative 
ratinga  alone. 

Contmnt  vtlJLdlty,  In  theory,  comparative  rating  acalea  can  bo 
oonatructed  ao  that  they  at  leaat  appear  to  poaaeao  content  validity. 
To  illuatrato,  auppoaa  that  aevoral  alternative  cockpit  diaplaya  were 
being  compared  to  determine  which  givea  pilota  tha  beat  SA.  Two 
approaohea  are  poaaibla.  Firat,  the  diaplaya  could  bo  compared  on 
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each  of  ••varaX  dimanalons,  ona  dimanaion  at  a  tima  (a.g.,  locationa 
of  anamy  aircraft r  atatua  of  anamy  anti-aircraft  artillery^  novamanta 
of  anamy  tank  unita,  locationa  of  friandly  ground  foroaa,  ato.)< 
four  diaplaya  wara  to  ba  oomparad  on  juat  four  auoh  dimanaiona,  pilota 
would  hava  to  maka  4  x  6  ■  24  aaparata  compariaona;  if  aight 
conditiona  wara  oomparad,  tha  numbar  uf  compariaona  would  inoraaaa  to 
112.  Second,  inataad  of  comparing  the  diaplaya  along  pra-datarminad 
dimanaiona,  ampirioal  dimanaiona  could  ba  axtractad  uaing 
multidimanaional  acaling  mathoda  (Torgaraon,  1958).  in  order  to 
aatabliah  four  atabla  dimanaiona,  aa  many  aa  30  diaplaya  might  naad  to 
bo  compared,  requiring  435  compariaona.  Whothar  tha  first  or  aaoond 
approach  ia  taken,  it  ia  clear  that  tha  noadad  number  of  oomparicona 
can  become  quite  largo  vary  rapidly.  Aa  a  raault,  a  practical 
approach  to  oatabliahing  tha  content  validity  of  oomparativo  ratinga 
aoama  unlikely. 

Criterion  validity.  In  tha  oxparimant  reported  by  Fraokar  and 
Davie  (1990),  no  ayatamatic  ralationahip  between  8k  oomparativo 
ratinga  and  kill  probability  wan  found.  Both  within  and  aoroaa 
oxporimantal  conditiona,  tha  correlation  between  the  two  was  virtually 
zero.  Whan  combined  with  tha  poor  results  obtained  with  direct  rating 
maasuros  (aoa  above),  it  appears  that  aubjaotiva  Sk  ratings  in  general 
cannot  ba  used  to  predict  oritorion  parformanco  moasuras. 

DXMiCXXONS  FOR  FUTURI  RBSBkRCH 

Davalopmont  of  Sk  moaauromant  methods  hrs  only  juat  begun— and 
thin  ia  evident  in  tha  praooding  review.  Several  topics  and  probloma 
requiring  further  research  still  exist  and  have  bean  identified 
throughout  the  diacussion.  Soma  of  these  problems  eventually  may  be 
solved  through  continued  raaearch  on  existing  Sk  measures,  but  new 
measures  doubtless  will  be  needed  aa  well,  klthough  it  ia  not  yet 
clear  what  thoae  new  measures  should  be,  soma  poaaibilitiaa  auggeat 
thamaalvea  and  are  briefly  diaousaad. 

Continuing  Research  on  Bxisting  Measures 

Reiiabiiity.  Tha  reliability  of  Sk  metrics  will  continue  to 
require  research,  especially  because  limitations  in  their  validity  may 
well  ba  caused  by  limited  reliability.  Explicit  measures, 
particularly  location  memory  probes,  appear  to  bo  highly  unreliable. 
The  reasons  for  this  unreliability  need  to  bo  explored  if  the 
situation  is  to  bo  improved.  In  the  moan  time,  probes  of  location 
memory  should  be  used  with  great  caution. 

Validity.  The  throe  categories  of  Sk  metrics— explicit, 
implicit,  and  subjective  ratings — each  appears  to  have  its  own 
strengths  and  weaknesses.  In  terms  of  construct  validity,  explicit 
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msAsuras  may  £ar«  mor*  poorly  than  implloit  maaiuraa  baoauaa  tha 
formar  aaain  to  confound  momantary  with  raflaotiva  8h  wharaaa  tha 
lattar  do  not.  On  tha  othar  hand«  axplicit  maaaucaa  can  probably 
aohiava  high  eontant  validity  mora  aaaily  than  can  implicit  maaauraa. 
Both  tha  construct  and  criterion  validity  of  aubjactiva  ratings  ara 
quaationabla,  but  such  ratings  ara  usually  aasiar  to  oollaot  than 
aithar  thair  axplicit  or  implicit  oountarparta.  Thus,  no  ona  metric 
saams  adequate  in  and  of  itself.  Perhaps  these  three  classes  of 
methods  are  oomplamentary,  with  each  providing  information  not  easily 
available  from  tha  othar.  On  tha  othar  hand,  it  may  simply  ba  that 
tha  collactiva  results  of  tha  three  methodologies  may  ba  no  batter 
than  any  of  tha  methodologies  alone.  To  suggest  an  analogy,  combining 
a  broken  thermometer  with  a  broken  wind  gauga  will  not  provide  a  mora 
accurate  assaaamant  of  the  weather.  Thus,  whether  explicit,  implicit, 
and  subjective  ratings  are  complementary  and  hence  should  bo  used 
together  is  a  question  needing  further  study. 

Research  to  more  clearly  establieh  or  improve  the  construct 
validity  of  the  various  measures  continues  to  bo  needed. 

Unfortunately,  the  cognitive  theories  that  guide  tests  of  construct 
validity  are  currently  in  dispute  (Fracker  and  Wickens,  1989;  Hirst 
and  Kalmar,  1987;  Navon,  1984).  km  a  result,  definitive  teats  of 
construct  validity  may  have  to  await  resolution  of  some  of  these 
theoretical  controversies.  Nevertheless,  key  issues  that  need  to  be 
examined  are  (1)  the  relative  contributions  of  momentary  and 
reflective  8A  to  both  concurrent  and  retrospective  explicit  measures, 
(2)  the  sensitivity  of  all  measures  to  attentions!  demand,  mental 
workload,  and  attention  allocation  strategies,  and  (3)  tha  degree  to 
which  implicit  maasuros  such  as  envelope  sensitivity  confound  Bk  with 
intervening  processes  such  as  decision  making  and  rospenso  execution. 
In  addition  to  these  issues,  considerable  work  is  needed  with  regard 
to  subjective  8A  ratings.  The  most  immediate  need  is  for  a  theory  of 
subjective  8A  and  of  how  operators  go  about  ma]|>ping  their  parcaivad  8A 
onto  the  provided  rating  scales. 

Considerable  effort  may  be  needed  to  improve  the  content  validity 
of  most  measures.  Existing  explicit  measures  ara  quite  limited  in 
their  ability  to  capture  higher  levels  of  operator  8A  (such  as  goal 
awarennsB).  Implicit  measures  so  far  have  been  developed  only  in  the 
oontexb  of  simple  choices  (e.g.,  whether  or  not  to  fire  a  weapon). 
Extension  of  these  measures  to  mere  complex  choice  situations  seems  to 
bo  the  next  logical  step  in  their  development.  Subjective  rating 
scales  have  so  far  focused  either  on  conotruot  or  on  content  validity. 
If  rating  scales  ara  to  continue  playing  a  role  in  8A  assessment,  they 
should  be  expanded  to  achieve  both  construct  and  content  validity, 
perhaps  in  the  form  of  nested  rating  scales. 
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Eatablishingi  th*  eritarion  validity  of  SA  matrlos  aaaufflaa  that 
woll-«itabllah«d  miaslon  parformanea  oritaria  axiat/  an  assumption 
that  may  not  always  ba  mot.  But  whara  such  oritaria  axiot,  axplioit 
8A  maaouras  in  particular  hava  not  parfocmad  as  wall  as  might  ba 
hopad.  Parhaps  this  poor  parformanoa  will  imprei/a  whan  tha 
reliability  problams  ara  solvad.  Xn  any  oasa,  oritarion  validity 
should  eontinua  to  ba  a  focus  in  tha  davalopmant  of  axplioit  BA 
asssssmant.  Aagarding  aubjaotiva  BA  ratings#  oritarion  validity  oan 
probably  bo  aohiavad  only  by  inoorporating  parformanoa  oritaria  into 
jha  fflsasura#  parhaps  by  quantifying  tha  inoonsistanoy  bstwoon  salf- 
assasaod  SA  and  tho  quality  of  actual  mission  parformanoa. 

Dovslopisg  Now  B.'^  "aasuramaat  Nstbods 

Bartar  and  N.  t  (1991)  hava  pointad  out  that  now  maasurao  of  BA 
ara  still  naodad,  ospaoially  maaouras  that  will  ostablish  bottsr 
oontant  validity.  For  axampls#  thara  is  as  yst  no  good  way  in  which 
to  asaoss  highar  lovols  of  BA  such  as  goal  or  organisation  awaranass. 
Raal-tima  asaaasmant  of  thaaa  highar  lovols  is  difficult  to  imagina# 
axoapt  possibly  for  ths  usa  of  verbal  protocols  (Barter  and  Woods, 
1991).  A  verbal  protocol  is  obtained  by  having  oparaters  vorbalisa 
thair  thoughts  as  thay  oarry  out  their  missions.  Thass  protocols  ars 
rseordad  on  tape  and  later  analysed  off  line.  Rotrospaotivo  protocols 
oollaetad  after  tha  fact  (e.g.,  Whitaker  and  Xlsin,  1987)  may  also  bo 
useful  in  this  regard  but  would  anoountar  tha  problems  discussed 
earlier  (under  explicit  Measures) .  Xf  either  eonourcant  or 
rotrospectiva  protocols  ara  to  ba  used,  new  methods  of  analysing  them 
may  need  to  ba  davalopad  in  order  to  ravaal  tha  higher  Isvalo  of  BA 
latent  within  them. 

Further  davalopmant  of  subjective  rating  approaches  to  oontant 
validity  might  also  prove  useful.  Xn  particular,  organisational 
psychologists  have  davalopad  subjective  rating  methods  for  job 
analysis  (s.g.#  NoCormick,  1976,  1979)  that  could  possibly  ba  adapted 
to  SA  assassment.  For  example,  tha  military  aircraft  cockpit  might  bo 
daoompossd  into  individual  displays  and  particular  itoms  of 
information  found  on  those  displays.  Xn  tha  ossa  of  multi-function 
displays,  a  throa-loval  decomposition  of  diaplays-pagos-information 
might  bo  naodad.  Following  a  mission,  pilots  might  than  rata  oaoh 
item  of  information  for  a  particular  page  or  display.  Boma  pilots 
might  rata  how  much  time  thay  spent  attending  to  oaoh  item.  Other  (or 
tho  same)  pi.'. ate  might  indicate  tha  importance  of  each  item  to  ths 
mission.  Still  others  might  indicate  how  difficult  the  items  wars  to 
find  or  uss.  Xtams  that  wore  critical  to  mission  success  but 
difficult  to  find  or  uss  could  point  to  changes  in  the  displays  that 
would  improve  BA.  Xtsms  that  ware  frequently  attended  but  not 
critical  could  indicate  that  tha  displays  are  badly  formatted. 
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•noouraging  aub-optimal  attantion  ateratagiaa.  Finding  and  oorraoting 
auoh  formatting  problama  would  alao  oontributa  to  battar  8A. 

Zn  aummary/  much  work  ia  atill  naadad  bafora  highly  raliabla^ 
wall~validatad  maaauraa  of  oparator  8A  will  ba  availabla.  Zn  tha 
maantima,  tha  military  aarvioaa  will  oontinua  adarohing  for  waya  to 
improva-~and  potantial  advaraariaa  will  oontinua  looking  for  waya  to 
dagrada— friandly  oombatanto*  8A. 
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